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(57) Abstract 



A method for generating an application specific integrated circuit including providing a software configurable semiconductor integrated 
circuit having a fixed hardware architecture that includes a plurality of task engines. A high-level language compiler is provided that compiles 
a user created high-level language program that defines die application specific integrated circuit The compiler parses the program into a 
plurality of microtasks for instructing the plurality of task engines to implement the application specific integrated circuit. 
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Method o f Generating Application Specific Integrated Circuits 
Using a Programmable Hardware Architecture 



Field of the Invention 

The invention relates generally to the field of application specific integrated circuits. In 
particular, the invention relates to methods of generating application specific integrated circuits 
using a fixed but configurable hardware architecture. 

Background of the Invention 

Custom integrated circuits are widely used today in the electronics industry. The demand 
for custom integrated circuits is rapidly increasing because of a dramatic growth in the demand 
for highly specific consumer electronics and a trend towards increased product fiinctionality. 
Also, the use of custom integrated circuits is advantageous because they reduce system 
complexity and, therefore, lower manufacturing costs, increase reliability and increase system 
performance. 

There are numerous types of custom integrated circuits. One type is programmable logic 
devices (PLDs) including field programmable gate arrays (FPGAs). FPGAs are designed to be 
programmed by the end user using special-purpose equipment.. Progranmiable logic devices are, 
however, undesirable for many applications because they operate at relatively slow speeds, have 
relatively low capacity, and have relatively high cost per chip. 

Another type of custom integrated circuit is application-specific integrated circuits 
(ASICs) including gate-array based and cell-based ASICs which are often referred to as 
"semicustom" ASICs. Semicustom ASICs are programmed by defining either a) defining the 
placement and interconnection of a collection of predefined logic cells which are used to create a 
mask for manufacturing the IC (cell-based) or b) defining the final metal interconnection layers to 
lay over a predefined pattern of transistors on the silicon (gate-array-based). Semicustom ASICs 
can achieve high performance and high integration but can be undesirable because they have 
relatively high design costs, have relatively long design cycles (time it take to transform given 
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consuming and expensive iterations. Customer modifications or problems occurring during the 
design cycle may require costly redesign and long delays. 

Because of the trend towards increased product functionality in the electronic industry, the 
complexity of custom integrated circuits is rapidly increasing. The level of skill required to 
generate custom integrated circuits and the design cycle time is also rapidly increasing. 
Consequently, prior art methods of generating custom integrated circuits are becoming 
increasingly inadequate. There currently exists a need for a method of generating application 
specific integrated circuits that reduces the design cycle time of custom integrated circuits. There 
also exists a need for a method of generating application specific integrated circuits that allows for 
modification during the design cycle. 

Summary of the Invention 

It is therefore a principal object of this invention to greatly reduce the number of steps that 
it takes to produce an application specific integrated circuit and, therefore, to greatly reduce the 
design cycle time and the manufacturing cost. It is another object of this mvention to provide a 
method of generating an application specific integrated circuit that easily implements design 
modifications during the design cycle. 

It is another objea of this invention to reduce the engineering skill level required to create 
an application specific integrated circuit. A principal discovery of the present invention is that a 
custom integrated circuit can be produced by programming a fixed architecture integrated cu*cuit 
using a high-level object oriented programming language. It is another principal discovery that a 
custom integrated circuit can be produced by as littie as two steps comprising describing the 
desired fiinctionality of the integrated circuit in an object oriented programming language and 
compiling the object oriented program onto the fixed programmable architecture. 

It is yet another principal discovery that a compiler can be used to perform high level 
synthesis to map specific ftmctions of an application onto task engines comprised of set data 
paths in the PSA IC thereby eliminating time intensive analysis of possible data path. 

Accordingly, the present invention features a method for generating an application specific 
integrated circuit that includes providing a software configurable semiconductor integrated circuit 
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engine. At least one task engine may be included that programs an input and an output interface 
for accepting data with a communication protocol. The hardware architecture may also include a 
program memory associated with each of the task engines for storing microtasks for instructing 
the task engines. 

A high-level language compiler compiles a user created high-level language program that 
defines the application specific integrated circuit. The compiler parses the program into a 
plurality of microtasks that instruct the plurality of task engines to implement the application 
specific integrated circuit. The apparatus may also include a software simulator that evaluates 
particular architectures. 
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The component libraries define a set of component objects that are used as building blocks 
in developing an application for the PSA IC. The component objects are class definitions derived 
from a base class. Each component object has predefined input and output channels. The 
component objects may include code for compression, coding, digital audio/video, connectivity, 
and sv^tching functions. For example, component objects may include code for a JPEG codec, an 
MPEG codec, an ATM switch, or a USB bus slave. 

The application firamework libraries define a complete description of a system that can be 
run on a PSA IC. Examples of such systems are code multimedia, video conferencing, set top 
box and audio systems. Specifically, an application fi-amework library may include code for image 
processing, user control and image compression functions of a digital camera. 

Using application libraries 58 reduces the design cycle time and the level of skill required 
to produce the custom integrated circuit. Using the application libraries also improves system 
performance because the libraries are optimized for particular target task engines. In addition, 
using application libraries makes the system development more intuitive because users can design 
a system fi-om high level blocks. 

The user program 56 containing the custom code that defines the system specification for 
the ASIC and the application libraries 58 is parsed by the PSA Compiler 52. The PSA compUer 
52 converts the user program 56 into a program image 60 of the system specifications for the 
ASIC that comprises a series of microtasks. Each microtask is a Very Long Instruction Word 
(VLIW) program for a target task engine in the PSA IC 54. 

The PSA analysis system 62 is a graphical analysis environment that allows the user to 
analyze certain characteristics of their application running on a target PSA IC. It includes a PSA 
simulator, PSA configuration tool, and graphical user interface (GUI) environment. The PSA 
simulator is a software model of a specific configuration of the PSA IC hardware that can execute 
the program image 60 produced by the PSA compiler 52. The PSA configuration tool allows the 
user to selea between different configurations of the PSA IC. The configuration information is 
used by the PSA compiler 52 and by the simulator to accurately reflect the characteristics of the 
target PSA IC 54. The graphical user environment allows the user to interact with the different 
analysis tools through a graphical windowing interface. 
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global analyzer 106 allocates memory segments for communicating in shared memory and for 
local storage required by the task in private memory. The global analyzer 106 also allocates 
memory locations based on data lifetime analyses. 

The global analyzer 106 also performs datapath analysis. The global analyzer 106 
5 determines the optimal task engines on which to execute the functionality of a given task. This is 
done by analyzing the required operations of the task (addition, multiplication, etc.) and matching 
those to a task engine that can optimally support the collection of operations required by the task. 
For example, if a task includes a number of multiplication operations, the global analyzer 106 
maps the task onto a task engine that includes a multiplier as one of its computation units. 

10 The global analyzer 106 also performs task scheduling. The global analyzer 106 

determines the relationships between tasks and inserts code to efficiently order the execution of 
the tasks so as to optimize the control flow and data relationships between tasks. In addition, the 
global analyzer 106 performs system level optimization and allocation. That is, the global 
analyzer 106 processes the entire program (rather than individual modules) and makes global 

15 decisions and optimizations for program execution and memory references. 

The global analyzer 106 also inserts direct memory references, rather than memory 
addressing, in the instruction set. Direct memory addresses can be used because the PSA IC 
hardware architecture implements the overall system including the memory used for data storage. 
Dh-ect memory addressing is advantageous because it minimizes the number of required 
20 instructions and enables asymmetric pipelining of instructions. Finally, the global analyzer 106 
generates intermediate forms that are data structures which define a network of tasks where each 
task is represented by a control/dataflow graph. 

A task analyzer and code generator 108 processes the tasks allocated to specific task 
engines and generates a series of microtask definitions that run on particular task engines. The 
25 microtasks are atomic execution units that are non-preemptive. That is, once they are executed, 
they will run to completion without being interrupted. In one embodiment, the task analyzer and 
code generator 108 uses program decomposition techniques to decompose tasks into threads and 
then, through data flow analysis, decompose the threads into microtasks. The task analyzer and 
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152 with predefined communication mechanisms and memory structures. The task engines 152 
are high performance data paths that are programmed with a Very Long Instruction Word 
(VLIW). Programming with VLIWs allows the compiler to select the optimum task engine to 
perform some part of the ASIC operation. 

The task engines 152 are optimized for different types of tasks. Some tasks will require 
advanced processing capabilities and, therefore, will require ALUs and multiphers in the task 
engine's datapath. Other tasks may simply be used to transform sequences of data and, therefore, 
will require only a simple datapath consisting of little more than a shift register. 

The PSA hardware 150 can communicate with a diverse range of other circuits through 
various protocols. The inputs and outputs of the PSA hardware 150 are programmed using 
special input/output task engines 154 (I/O task engines) that interface with input/output memory 
(I/O memory) 156. The I/O memory 156 is random-access memory (RAM) that stores data that 
is input from and output to external pin modules (not shown) of input/output sections 160 (I/O 
sections). In one embodiment, there are at least two I/O memories 156 interfacing with at least 
two I/O sections 156. 

In one embodiment, the external pin modules are configurable through pin configuration 
registers. The I/O task engines 154 associated with the I/O memory 156 and the I/O section 160 
can be programmed to write new values into these configuration registers in order to configure 
the operation of the pin module. 

In one embodiment, the pin-to-niemory-location mapping of the PSA hardware 150 is 
predefined (i.e., not progranunable). The predefined mapping allocates one or more pins into a 
word memory location. Pins may be bundled so that, for example, 16 pin values may be mapped 
into a single word. This enables programs to allocate sets of pins to commonly bundled values 
(e.g., data busses). 

The task engines 152, including the I/O sections 160, communicate by the shared memory 
modules 158. The shared memory modules 158 are typically random-access memories (RAM) 
that can be written to and read from by the task engines. Communication between tasks is 
performed by having one task write to and another task read from the shared memory 1 58. 
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bus 210 allows the task engine 200 to receive tasks from the scheduler and other task engines 
connected to the queue bus 210. 

A task controller 212 that is in communication with the task queue 208 and the data bus 
206 manages the execution of microtasks that are stored on the task queue 208. The task 
controller 212 includes a program counter (not shown) that is updated to include the next 
microtask to run and manages the loading and execution of instructions for that microtask. 

Instruction memory 214 is in communication with the task controller 200. The instruction 
streams for each microtask assigned to run on particular task engine 206 are loaded into the 
instruction memory for the particular task engine. In one embodiment, the instruction memory 
214 is non-volatile memory such as flash memory. 

An address generation unit 216 takes memory references from the task controller 212 and 
converts them to actual memory locations in either the private data memory 202 or in shared 
memories 203. There are two advantages to this approach: shorter inmiediate addressing saves 
program memory and vector address generation increases performance and saves execution 
resources. This causes the data in those memory locations to be made available to the appropriate 
data bus. 

Different task engines are designed to have diflFerent configurations and types of 
computation units in communication with the data bus 206. A computation unit is an on-chip 
block which performs some computational function. Typical examples of computation units are 
multipliers 218, data processing units (DPUs) 220, and shifters units 222. 
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What is claimed is: 

1 1. A method for generating an application specific integrated circuit, the method comprising: 

2 a) providing a software configurable semiconductor integrated circuit having a fixed 

3 hardware architecture that includes a plurality of task engines and configurable 

4 yo; 

5 b) providing a high-level language compiler; and 

6 c) compiling a user created high-level language program with the compiler that 

7 defines the application specific integrated circuit, the compiler parsing the program 

8 into a plurality of microtasks for instructing the plurality of task engines to 

9 implement the application specific integrated circuit. 

1 2. The method of claim 1 fiirther comprising providing the user created high-level language 

2 program that defines the application specific integrated circuit. 

1 3 . The method of claim 1 wherein the high-level language is an object oriented or 

2 component-based programming language. 

1 4. The method of claim 3 fiirther comprising providing at least one object oriented class 

2 library that is compilable with the high-level language compiler to generate microtasks for 

3 instructing task engines to perform algorithms, data communications or data manipulation. 

1 5. The method of claim 3 wherein the object oriented programming language is a Java 

2 programming language. 

1 6. The method of claim 1 wherein each of the microtasks comprises a Very Long Instruction 

2 Word program that instructs a task engine. 

1 7. The method of claim 1 fiirther comprising loading the microtasks into program memory 

2 associated with the task engines. 



1 

2 



8. 



The method of claim 1 wherein the compiler is optimized to select an optimum task engine 
for each of the microtasks. 
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The apparatus of claim 13 wherein the fixed hardware architecture further comprises a 
program memory associated with each of the task engines for storing microtasks for 
instructing the task engines. 

The apparatus of claim 13 wherein each of the plurality of task engines is programmable 
with a unique Very Long Instruction Word instruction set, 

A method for generating an application specific integrated circuit, the method comprising: 

a) providing a software configurable semiconductor integrated circuit having a fixed 
hardware architecture including a plurality of task engines; 

b) providing a high-level language compiler; 

c) providing a user created high-level language program for the high-level language 
compiler that defines the application specific integrated circuit; and 

d) compiling the program with the compiler, the compiler parsing the program into a 
plurality of microtasks for instructing the plurality of task engines to implement the 
application specific integrated circuit. 
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